Skip to content

do some small optimization to ops#943

Merged
wanghan-iapcm merged 5 commits into
deepmodeling:develfrom
njzjz:optimize-ops
Feb 11, 2022
Merged

do some small optimization to ops#943
wanghan-iapcm merged 5 commits into
deepmodeling:develfrom
njzjz:optimize-ops

Conversation

@njzjz

@njzjz njzjz commented Aug 10, 2021

Copy link
Copy Markdown
Member

avoid concat or add in loops. Instead, append tensors to a list, and concat or accumulate_n after loops

1. avoid concat or add in loops. Instead, append tensors to a list, and concat or accumulate_n after loops
2. remove a duplicated reshape
@codecov-commenter

codecov-commenter commented Aug 10, 2021

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 81.25000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.68%. Comparing base (0d8fe0a) to head (30f8e7c).

Files with missing lines Patch % Lines
deepmd/fit/polar.py 50.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##            devel     #943      +/-   ##
==========================================
+ Coverage   75.67%   75.68%   +0.01%     
==========================================
  Files          92       92              
  Lines        7671     7671              
==========================================
+ Hits         5805     5806       +1     
+ Misses       1866     1865       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@amcadmus amcadmus requested a review from denghuilu August 10, 2021 02:01
@amcadmus

Copy link
Copy Markdown
Member

Have you benchmarked these optimization? Do they help improving the efficiency?

@njzjz njzjz marked this pull request as draft August 10, 2021 02:40
@njzjz

njzjz commented Aug 10, 2021

Copy link
Copy Markdown
Member Author

Have you benchmarked these optimization? Do they help improving the efficiency?

I just benchmarked it. The answer is no😂

@njzjz njzjz closed this Aug 10, 2021
@njzjz njzjz reopened this Jan 13, 2022
@njzjz njzjz removed the request for review from denghuilu January 13, 2022 13:50
@njzjz

njzjz commented Jan 13, 2022

Copy link
Copy Markdown
Member Author

I think these optimizations may be more important to CPUs, compared to GPUs. I will recheck this PR.

@njzjz

njzjz commented Jan 14, 2022

Copy link
Copy Markdown
Member Author

Do some profiling here:
(1) + vs accumulate_n
+ one by one has more ops than accumulate_n once.

image
+

image
accumulate_n

(2) concat
image

@njzjz njzjz marked this pull request as ready for review January 14, 2022 07:36
Comment thread deepmd/descriptor/se_a.py
bavg = bavg,
trainable = trainable,
suffix = "_"+str(type_i))
if type_i == 0:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we have a bug here? if type_i == 0 and (type_input, type_i) in self.exclude_types we had ret accumulated.

@wanghan-iapcm

wanghan-iapcm commented Jan 15, 2022

Copy link
Copy Markdown
Collaborator

@denghuilu Would the revised code be faster on GPUs?

@njzjz

njzjz commented Jan 16, 2022

Copy link
Copy Markdown
Member Author

I think one cannot see any difference if there are only one or two elements. A system with at least 10 atom types should be tested.

@denghuilu

Copy link
Copy Markdown
Member

There is a slight performance penalty on V100 GPU with the water benchmark system:

optimize-ops branch


DEEPMD INFO    batch     100 training time 3.36 s, testing time 2.34 s
DEEPMD INFO    batch     200 training time 1.73 s, testing time 2.32 s
DEEPMD INFO    batch     300 training time 1.75 s, testing time 2.32 s
DEEPMD INFO    batch     400 training time 1.73 s, testing time 2.41 s
DEEPMD INFO    batch     500 training time 1.72 s, testing time 2.37 s
DEEPMD INFO    batch     600 training time 1.74 s, testing time 2.36 s
DEEPMD INFO    batch     700 training time 1.76 s, testing time 2.43 s
DEEPMD INFO    batch     800 training time 1.77 s, testing time 2.48 s
DEEPMD INFO    batch     900 training time 1.75 s, testing time 2.47 s
DEEPMD INFO    batch    1000 training time 1.72 s, testing time 2.41 s

devel branch

DEEPMD INFO    batch     100 training time 3.03 s, testing time 0.02 s
DEEPMD INFO    batch     200 training time 1.60 s, testing time 0.02 s
DEEPMD INFO    batch     300 training time 1.63 s, testing time 0.02 s
DEEPMD INFO    batch     400 training time 1.59 s, testing time 0.02 s
DEEPMD INFO    batch     500 training time 1.58 s, testing time 0.02 s
DEEPMD INFO    batch     600 training time 1.62 s, testing time 0.02 s
DEEPMD INFO    batch     700 training time 1.59 s, testing time 0.02 s
DEEPMD INFO    batch     800 training time 1.58 s, testing time 0.02 s
DEEPMD INFO    batch     900 training time 1.60 s, testing time 0.02 s

Maybe the GPU implementation did not use the stream parallelization.

@wanghan-iapcm

Copy link
Copy Markdown
Collaborator

There is a slight performance penalty on V100 GPU with the water benchmark system:

optimize-ops branch


DEEPMD INFO    batch     100 training time 3.36 s, testing time 2.34 s
DEEPMD INFO    batch     200 training time 1.73 s, testing time 2.32 s
DEEPMD INFO    batch     300 training time 1.75 s, testing time 2.32 s
DEEPMD INFO    batch     400 training time 1.73 s, testing time 2.41 s
DEEPMD INFO    batch     500 training time 1.72 s, testing time 2.37 s
DEEPMD INFO    batch     600 training time 1.74 s, testing time 2.36 s
DEEPMD INFO    batch     700 training time 1.76 s, testing time 2.43 s
DEEPMD INFO    batch     800 training time 1.77 s, testing time 2.48 s
DEEPMD INFO    batch     900 training time 1.75 s, testing time 2.47 s
DEEPMD INFO    batch    1000 training time 1.72 s, testing time 2.41 s

devel branch

DEEPMD INFO    batch     100 training time 3.03 s, testing time 0.02 s
DEEPMD INFO    batch     200 training time 1.60 s, testing time 0.02 s
DEEPMD INFO    batch     300 training time 1.63 s, testing time 0.02 s
DEEPMD INFO    batch     400 training time 1.59 s, testing time 0.02 s
DEEPMD INFO    batch     500 training time 1.58 s, testing time 0.02 s
DEEPMD INFO    batch     600 training time 1.62 s, testing time 0.02 s
DEEPMD INFO    batch     700 training time 1.59 s, testing time 0.02 s
DEEPMD INFO    batch     800 training time 1.58 s, testing time 0.02 s
DEEPMD INFO    batch     900 training time 1.60 s, testing time 0.02 s

Maybe the GPU implementation did not use the stream parallelization.

Why the testing time of optimize-ops is so long?

@njzjz

njzjz commented Feb 10, 2022

Copy link
Copy Markdown
Member Author

Why the testing time of optimize-ops is so long?

It was fixed by #1419 -- this branch is behind devel.

@denghuilu

Copy link
Copy Markdown
Member

It did have some benefits:

DEEPMD INFO    batch     200 training time 1.59 s, testing time 0.02 s
DEEPMD INFO    batch     300 training time 1.56 s, testing time 0.02 s
DEEPMD INFO    batch     400 training time 1.57 s, testing time 0.02 s
DEEPMD INFO    batch     500 training time 1.59 s, testing time 0.02 s
DEEPMD INFO    batch     600 training time 1.59 s, testing time 0.02 s
DEEPMD INFO    batch     700 training time 1.60 s, testing time 0.02 s
DEEPMD INFO    batch     800 training time 1.60 s, testing time 0.02 s
DEEPMD INFO    batch     900 training time 1.60 s, testing time 0.02 s
DEEPMD INFO    batch    1000 training time 1.57 s, testing time 0.02 s

@wanghan-iapcm wanghan-iapcm merged commit 82c787d into deepmodeling:devel Feb 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants